A Multivariate Two-Sample Test using the Jaccard Distance
نویسنده
چکیده
A common need in statistics is to assess whether two samples come from the same underlying population distribution. Existing two-sample tests often make limiting a priori assumptions, or cannot be easily generalized to multivariate data. We derive a new multivariate two-sample test that makes no a priori assumptions, has higher statistical power than previous tests, has better runtime performance, has an easily understood geometrical interpretation, and is simple to implement.
منابع مشابه
Jaccard distance based weighted sparse representation for coarse-to-fine plant species recognition
Leaf based plant species recognition plays an important role in ecological protection, however its application to large and modern leaf databases has been a long-standing obstacle due to the computational cost and feasibility. Recognizing such limitations, we propose a Jaccard distance based sparse representation (JDSR) method which adopts a two-stage, coarse to fine strategy for plant species ...
متن کاملHypothesis testing of genetic similarity based on RAPD data using Mantel tests and model matrices
Clustering and ordination procedures in multivariate analyses have been widely used to describe patterns of genetic distances. However, in some cases, such as when dealing with Jaccard coefficients based on RAPD data, these techniques may fail to represent genetic distances because of the high dimensionality of the genetic distances caused by stochastic variation in DNA fragments among the unit...
متن کاملCombining Mahalanobis and Jaccard Distance to Overcome Similarity Measurement Constriction on Geometrical Shapes
In this study Jaccard Distance was performed by measuring the asymmetric information on binary variable and the comparison between vectors component. It compared two objects and notified the degree of similarity of these objects. After thorough preprocessing tasks; like translation, rotation, invariance scale content and noise resistance done onto the hand sketch object, Jaccard distance still ...
متن کاملMultivariate Stream Data Classification Using Simple Text Classifiers
We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to clas...
متن کاملA Test of Homogeneity for Two Multivariate Populations
The classical tests of homogeneity, such as the twosample Kolmogorov-Smirnov test, do not have a natural extension to comparing two multivariate populations. G. J. Székely and N. K. Bakirov have proposed a new test based on Euclidean distance between sample elements. This test can be applied to testing homogeneity of any two multivariate populations with finite second moments, and the test is r...
متن کامل